---
title: Introduction
description: Overview of benchmarking in the Cua agent framework
---

The Cua agent framework uses benchmarks to test the performance of supported models and providers at various agentic tasks.

## Benchmark Types

Computer-Agent benchmarks evaluate two key capabilities:

- **Plan Generation**: Breaking down complex tasks into a sequence of actions
- **Coordinate Generation**: Predicting precise click locations on GUI elements

## Using State-of-the-Art Models

Let's see how to use the SOTA vision-language models in the Cua agent framework.

### Plan Generation + Coordinate Generation

**[OS-World](https://os-world.github.io/)** - Benchmark for complete computer-use agents

This leaderboard tests models that can understand instructions and automatically perform the full sequence of actions needed to complete tasks.

```python
# UI-TARS-1.5 is a SOTA unified plan generation + coordinate generation VLM
# This makes it suitable for agentic loops for computer-use
agent = ComputerAgent("huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B", tools=[computer])
agent.run("Open Firefox and go to github.com")
# Success! 🎉
```

### Coordinate Generation Only

**[GUI Agent Grounding Leaderboard](https://gui-agent.github.io/grounding-leaderboard/)** - Benchmark for click prediction accuracy

This leaderboard tests models that specialize in finding exactly where to click on screen elements, but needs to be told what specific action to take.

```python
# GTA1-7B is a SOTA coordinate generation VLM
# It can only generate coordinates, not plan:
agent = ComputerAgent("huggingface-local/HelloKKMe/GTA1-7B", tools=[computer])
agent.predict_click("find the button to open the settings") # (27, 450)
# This will raise an error:
# agent.run("Open Firefox and go to github.com")
```

### Composed Agent

The Cua agent framework also supports composed agents, which combine a planning model with a clicking model for the best of both worlds. Any liteLLM model can be used as the plan generation model.

```python
# It can be paired with any LLM to form a composed agent:
# "gemini/gemini-1.5-pro" will be used as the plan generation LLM
agent = ComputerAgent("huggingface-local/HelloKKMe/GTA1-7B+gemini/gemini-1.5-pro", tools=[computer])
agent.run("Open Firefox and go to github.com")
# Success! 🎉
```
